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^ . . The efficacy of the CATES system for making training 

decisions and determining student proficiency in Naval in-flight 
training proposed in an earlier study (Rankin and McDaniel, 1980) is 
compared with the present system of instructor judgments for 
performance assessment. The current study used 29 newly-designated 
naval aviators undergoing Fleet Replacement Pilot Training in the 
SH-3 aircraft. From an inventory of 190 tasks, 18 tasks were selected 
to evaluate the model. Standard training materials and equipment were 
used, and performance was graded as students proceeded through the 
training syllabus. The task performance information required to reach 
a decision and the level of student proficiency upon completion of 
the training program were then analyzed. Results indicate the CATES 
system requires less information to make a decision than the current 
human-judgment system and is reliably more accurate, suggesting 
greater consistency and accuracy of mathematical models to an actual 
training situation in a considerably more unstructured environment 
than previous research studies. This report also provides extensive 
details of the statistical decision model used by the CATES system, 
results of other evaluations using mathematical models versus human 
models in decision making, study definitions, research methods, 
results, and conclusions. Six appendices include related materials 
and 30 references. (LMM) 
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SECTION I 
INTRODUCTION 

Determination of student performance level and, subsequently, decisions 
to either continue or stop training have posed a perplexing problem for 
instructors and training" managers. This problem is especially troublesome 
for instructors and training managers providing pilot training. In-flight 
training for pilots requires considerable resource expenditures involving 
both highly skilled human resources as well as sophisticated equipment. 
Training is generally accomplished by a one-on-one instructor-student rela- 
tionship. Thus, training continued beyond established training objectives 
is costly. 4 However, termination of training before the student pilot achieves 
the skills required of him in the precise aviation environment is also highly 
undesirable. 

Rankin and McDaniel (1980) proposed a Computer Aided Training Evaluation 
and Scheduling (CATES) system for achieving improvements in the precision of 
proficiency judgments" and in determining student proficiency during in-flight 
training. This method provides a computer managed, prescriptive training 
program based on individual student performance. The CATES system uses a 
proficiency grading system developed by Browning, Ryan, Scott, and Smode 
(1977). These grades are then evaluated as they arc; awarded using a sequen- 
tial sampling technique as a means for making statistical decisions with a 
minimum sample introduced by Wald (1947). According to Rankin and McDaniel 
(1980), the conceptual CATES decision model augurs well with the present 
system of instructor judgments. What remains is to assess the efficacy of 
the CA*,-5 decision model using actual data and to determine from this assess- 
ment if the CATES system offers some practical advantage. 

PURPOSE 

The objectives of this study are twofold. The first objective is to 
compare the efficacy of the CATES system with the present system of "human 
judgments" for performance assessment in flight training with regard to: 

• efficiency in reaching decisions\ 

• quality of decisions. \ 

Increased efficiency in reaching training decisions; e.g., reduced informa- 
tion requirements to determine when to stop training, could result in signif- 
icant reductions in training costs. Increased duality of training decisions 
would produce a more effective utilization of training resources and reduce 
the risk of incorrect decisions; e.g., the decision is made to stop training 
when additional training is needed. The sfecond objective is to demonstrate 
that the CATES system can be used with some advantage in an actual flight 
training program. \ 

ORGANIZATION OF THE REPORT ' / 

In addition to this introduction, four sections and six appendices are 
presented. Section II presents the development of ^the statistical decision 

9 
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model used by the CATES system and results of other evaluations using nathe- 
matical models versus human models in decision making. Section III presents 
the method used for comparing the CATES system decision model with the present 
system of decision making and the operational definitions used in this evalua- 
tion. Section IV presents the results and comparisons of efficient use of 
information in reaching decisions and the quality of the decisions as evidenced 
by performance on a final flight evaluation. Section V presents a discussion 
of the results and formulates conclusions based on the findings with recommenda- 
tions for further applications of the CATES system. 

Appendix A contains a description of the Wald Binomial Probability Ratio 
Test. Appendix B is a listing of the tasks and respective task parameters 
that were used in this evaluation. Appendix C contains the tasks used to 
evaluate decision efficiency as a function of difficulty. Appendix D contains 
a sample grade card used for data recording. Appendix E contains a copy of 
the Naval Air Training and Operating Procedures Standardization Program (NATOPS) 
Evaluation Worksheet. Appendix F contains the mathematical equation used 
for estimating trials to a training decision for the CATES decision model. 
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SECTIOH II 

DEVELOPMENT OF GATES DECISION MODEL 

NEED FOR ACCURATE PROFICIENCY ASSESSMENT 

Simulator effectiveness evaluations and transfer of training studies 
have been faced with the problem of determining accurate studerrt performance 
levels during or after training (Caro, Shelnutt, arid Spears, 1981) . For 
example, errors in performance assessment leading to overtraining results 1n 
lowered training effectiveness ratios (Holman, 1979). The need for accurate 
proficiency assessment was recognized by the TAE6 while preparing an evalua- 
tion of the training effectiveness of a new state-of-the-art operational 
flight trainer (OFT), Device 2F64C, at the east coast SH-3 Fleet Replacement 
Squadron (FRS), Helicopter Antisubmarine Squadron ONE (HS-1). 

In an earlier study to determine the effectiveness of Device 2F87F (P-3 
Operational Flight Trainer) in the FRS, the inadequacies of current FRS grad- 
ing procedures for simulator effectiveness evaluations were recognized 
Browning, Ryan, Scott, and Smode, 1977; Browning, Ryan, and Scott, 1978). 
To overcome these inadequacies, the TAEG instituted a "proficiency grading 
system." The proficiency grading system provided a simple procedure for 
performance assessment by flight instructors. Each time a task was performed, 
performance was graded on a dichotomous scale that provided a grade of "P" 
if performance met established standards or a grade of w l" if performance 
was substandard. These grades were recorded in the sequence of student 
attempts, thus providing a history or protocol of student performance. The 
grading system provided two important attributes for evaluating student per- 
formance: (1) a static or cross sectional grade of performance on a task 
attempt and (2) a dynamic or longitudinal record of performance over several 
'attempts. - 

Determination of proficiency was accomplished by arbitrarily defining 
the point at which proficiency was attained by the following rule: 

1. over 50 percent of the trials (for a given task) on any flight had 
to be "P" and 

2. at least 50 percent of the trials were "P" on all subsequent flights 
(Browning; et al., 1978, p. 23). 

This approach was not useful in evaluating proficiency for the assessment of 
Device 2F64C at HS-1. The number of flight tasks requiring training was 
considerably greater for HS-1 than those trained in the Browning, et al. 
(1977) study. This larger number of tasks presented a greater range of diffi- 
culty and precluded the training of all tasks during one flight or training 
session (Browning, McDaniel, and Scott, 1981). Further complicating the 
problem of proficiency determination by this arbitrary rule was the fact 
that many tasks were limited to one attempt or trial per flight or session. 
Therefore, in many instances the student would be declared "Proficient 11 or 
"Not proficient" on the basis of one trial if the cited rule was followed. 
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Other approaches for determining level of proficiency were investigated. 
One such approach was to arbitrarily assess proficiency as being reached 
after the student had demonstrated performance to standards on two, three, 
or four successive trials. Such an approach was used in the Initial Entry 
Rotary Wing Flight Training Program by the Army (USAAVNC Evaluation Team, 
1979). The logic of such an approach was appealing; however, arbitrary selec- 
tion of the number of proficient trials needed to demonstrate proficiency do 
not account for variability in student performance, task difficulty, and 
variability in instructor ratings (Rankin and McDaniel, 1980). Also, both 
the approach used by Browning, et al. (1977) and the USAAVNC Evaluation Team 
(1979) required training protocols that include initial and final levels of 
proficiency to make accurate performance determinations. Neither approach 
could accommodate situations where only a small number of training trials 
are given or where there are wide differences in learning rates of students. 
Further, instructor knowledge of arbitrary decision rules defined in these 
approaches may also bias performance ratings. 

It appears that in actual practice, training decisions are more proba- 
bilistic than deterministic judgments. In other words, instructors and train- 
ing managers infer a probability of a range of acceptable performance by the 
student in the future rather than making an absolute prediction of a specific 
level of performance. The CATES decision model provides a method for assess- 
ing flight task proficiency based on the probabilistic nature of decision 
making. Using this method, an analogy of the training program can be envis- 
aged as a biasing process; students enter the training program with a low 
probability of performing the task to established standards. With successive 
trials, the probability of performing to established standards increases 
until it reaches the desired objective at which time training is terminated. 

In summary, the CATES system promised to achieve two purposes. First, 
it appeared to offer a^quantifiable method for the accurate quantification 
of student performance levels needed for simulator effectiveness evaluations. 
Second, and perhaps more important, the CATES system could provide training 
managers and instructors with a valuable tool to aid the decision-making 



MATHEMATICAL DECISION MODEL USED IN CATES 

A statistical decision model analogous to determining the probability 
that a student would perform a task to established standards is a sequential 
sampling method introduced by Wald (1947) and described in Rankin and McDaniel 
(1980). Appendix A provides a mathematical discussion of the Wald Binomial 
Probability Ratio Test used as the statistical decision model. The sequential 
sampling method differs from conventional sampling methods. Conventional 
sampling methods usually require a fixed number of items randomly drawn from 
a larger collection. The sampled items are examined and the decision is 
made to accept or reject the entire collection or lot based on this assess- 
ment. Sequential sampling does not use fixea sample sizes nor are the items- 
drawn at random from the entire lot. Rather, the items are examined in the 
order they are produced." Thus, the sample size required to raak«ra decision 
becomes variable and is dependent on four a priori parameters and the vari- 
ability of the ordered sequence. The four a priori parameters are: 



process. 
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• minimum proportion of nondefectives at or below which the 

0 collection or lot is rejected, or, conversely, the proportion of 
defectives above which the lot is rejected 

• desirable proportion of nondefectives at or above which the 
collection or lot is accepted (P2) 

• risk of making a TYPE I decisional error or declaring the lot 
acceptable when in fact it is not (Alpha (a)) 

• risk of making a TYPE ,11 decisional error or declaring the lot 
unacceptable when in fact it is acceptable (Beta (0)). 

The variability of the ordered sequence may either reduce or increase 
the sample size required to make a decision. For example, if the sequence 
contains items that are either consistently acceptable or consistently 
unacceptable, a decision may be reached with fewer items. If the sequence 
contains inconsistencies; i.e., both acceptable and unacceptable items, the 
sample size required to reach the appropriate decision will increase. 

Originally, the sampling procedure was used to determine whether a 
collection of a manufactured product should be rejected because the pro- 
portion of defectives is too high or should be accepted because the pro- 
portion of defectives is below an acceptable level. °In this industrial 
quality control setting, the inspector needs a chart similar to figure 1 to 
perform a sequential test to determine acceptable levels. As each item is 
observed, the inspector plots a point on the chart one unit to the right if 
it is not defective, one'unit to the right and one unit up if the item is 
defective. If the plotted line crosses the upper parallel line or boundary, 
the inspector will reject the production lot. If the plotted line crosses 
the lower boundary, the lot will be accepted. If the plotted line remains 
between the two boundaries, another sample item will be drawn and observed/ 
tested. Because sampling is expensive, a fixed limit on the number of items 
to he sampled may be set. If the limit is reached and the plotted line has 
not crossed either the upper or lower boundary, the inspector must then make 
a decision. Generally, the decision will be made to accept or reject based 
on the proximity of the last plotted point to the closest boundary (trunca- 
tion). This decision model has been used in the educational and trainirg 
settings by Ferguson (1970) and Kalisch (1980) . Previous use of the model 
in training was to evaluate performance after the learning period and to 
serve as an evaluation tool for computer-based instruction that conserved 
testing time by using a minimum sample of items. 

The CATES system decision model uses sequential sampling during the 
learning period and eventually terminates it. Figure 2 illustrates the CATES 
decision model as proposed by Rankin and McDaniel (1980) for assessing flight 
task proficiency. This figure shows a trainee trial sequence of 11PPPPPP. 
Analyzing this sequence using the decision model on the second trial of the 
sequence, the plotted line would cross t)ie lower boundary denoting the student 
is "Not Proficient. 11 Thus, the student is remediated and the plot starts 
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Figure 1. Hypothetical Sequential Sampling Chart 
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Figure 2. Sequential Sampling Decision Hodel for Running Takeoff Task 
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with the next trial. In this particular sequence, that trial is the first P 
trial in the sequence. On the sixth trial in the overall sequence (fourth 
trial in the new r iquence), the plotted Line crosses the upper boundary denot- 
ing the student is "Proficient" and training may cease. In this example, 
the student received two additional training trials after the CATES decision 
of "Proficient, Stop Training." 

SIMILARITY OF CATES DECISION MODEL AND ClfcRENT DECISION METHOD 

The mathematical algorithm used in the CATES system closely parallels 
the current decision method used by the training manager or instructor to 
determine when to terminate training. Like CATES, the human judgment method 
bases decisions on varying numbers of practice trials on the task rather 
than requiring a fixed number of practice trials. Consistency of student 
performance on training tasks is alsb considered in determining the appro- 
priate amount of trials. Students that perform consistently well on a task 
are considered proficient with less task performance information than those 
students that perform inconsistently. Instructors and training managers 
also appear to consider the risks involved in making an inappropriate decision. 

The advantage of the CATES decision model appears to be the quantifica- 
tion of acceptable (proficient) performance, unacceptable (not proficient) 
performance, and the risks (alpha and beta) involved in making an inappro- 
priate decision. The problem then is to assess the advantages offered by 
the mathematical algorithm in increasing the effectiveness of training deci- 
sions. The quantifying of performance and risk gained through the use of 
the mathematical algorithm is an obvious advantage in training effectiveness 
evaluations. Other practical advantages involve a better means to aggregate 
inconclusive information concerning student performance and a decision accuracy 
greater than, the current method. 

ADVANTAGES OF MATHEMATICAL DECISION MODELS 

Considerable investigation has been conducted on human decision behavior 
and the cognitive processes humans employ to make choices and solve decision- 
related problems. Comprehensive reviews of the experimental literature are 
available: Imhoff and Levine (1981), Lee (1971), Nickersort.and Feehrer (1975), 
Rapoport and Wallsten (1972), Slovic, Fischhoff, and Lichtenstein (1977), 
and Slovic and Lichtenstein (1971). Some relevant areas of study include: 
statistical decision theory (Fishburne, 1964), game theor/ (Luce and Raiffa, 
1957), and probabilistic information systems (Edwards, 1962). 

It is generally found that decisions reached by mathematical models are 
considerably more consistent and accurate than decisions based on human judg- 
ment (Dawes and Corrigan, 1974; Meehl, 1954; Sawyer, 1965). It appeals that 
human judgment decisions ,reoui: * more data than mathematical models as a 
result of poorly defined parameters and biases in the processing of informa- 
tion for decisions (Slovic, 1976; Tversky and Kahneman, 1974). Dawes (1979) 
proposes that mathematical models are especially good at aggregating informa- 
tion resulting in the more efficient use of available information. Dawes 
further suggests that humans have expertise in perceiving and sorting infor- 
mation that cannot be matched by a mathematical model. 



is 17 



Technical Report 130 



Given that human judgment excels in perceiving and sorting information 
and that mathematical models are especially good at combining or aggregating 
information, it appears that a combination of these models should considerably 
enhance the decision-making process. It follows that a combination of people 
Assessing trial performance and a mathematical model determining the integra- 
tion and quantity of these assessments should substantially increase the 
validity and reliability of training decisions. The potential value of a 
CATES decision model has been recognized for aviation management. Mixon 
(1981) recommended the decision model be used to assess proficiency of naval 
flight officers undergoing training at the A-6 aircraft Fleet Replacement 
Squadrons. 

Although previous research has indicated mathematical models may provide 
a potentially valuable decision making tool, results have generally been 
limited to laboratory studies and experiments. Evidence is needed to support 
the practical use of a mathematical decision model in a considerably more 
unstructured environment. To satisfy this need, this evaluation was conducted 
to extend the knowledge of the mathematical decision model to a direct appli- 
cation in training. 

APPLICATION OF THE CATES SYSTEM DECISION BODEL 

To examine the practicality of the CATES system decision model in a 
realistic training situation, an evaluation was conducted "in-situ M at HS-1, 
Naval Air Station, Jacksonville, Florida. Concurrent with this study, the 
TAEG was evaluating Device 2F64C (Browning, McDaniel, and Scott, 1981; 
Browning, McDaniel, Scott, and Smode, 1982). The test plan for this evalua- 
tion required instructors to use the proficiency grading system to record 
task trial performance of students undergoing flight training. As discussed 
previously in this section, the recording of task trial data forms an integral, 
necessary component of the CATES system decision model. In addition to the 
current method of making training decisions, data were recorded in a manner 
usable by the CATES system. Although the proficiency grading system posed 
an additional requirement for the instructors, it does not appear to over- 
burden them in accomplishing their duties. Further, most instructors seem 
to have accepted the proficiency grading system as a more useful method than 
current grading practices. 

Rankin and McDaniel (1980) envisaged that full implementation of the 
CATES system would require computer support. Although computer support is 
available to HS-1 through the Aviation Training Support System (ATSS), the 
TAEG and HS-1 agreed that before using the ATSS for ; computer support, the 
efficacy of the CATES system should be evaluated to, determine if advantages 
could be realized. If advantages using CATES were realized, full implementa- 
tion could be initiated. ' 

Full implementation would require data input fa the ATSS. Although 
this may appear to be an additional requirement, tfhe CATES system may provide 
a more eff icient* method of management control than the present system of 
maintaining "hard copy" records. 

In summary, the CATES system appears to place little additional burden 
on the training manager than current methods used and may actually relieve 
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certain requirements. This is contingent upon how well the CATES system 
"works" in the actual training environment* The method used to determine 
how well the CATES system "works" is presented in the next section. 
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SECTION III 
METHOD 

STUDENTS 

The student sample consisted of 29 newly designated naval aviators under- 
going Fleet Replacement Pilot Training in the SH-3 aircraft at HS-1. The 
students were recent graduates of Undergraduate Pilot Training at Pensacola, 
Florida, and had no prior flight experience in the SH-3 aircraft. 

■ TASKS 

The student was required to master approximately 190 flight tasks during 
Fleet Replacement Pilot Training to become qualified to fly the SH-3 aircraft. 
From the task inventory of 190 tasks, 18 tasks (appendix B) were selected to 
evaluate the CATES decision model proposed by Rankin and McDaniel (1980). 
These 18 tasks were representative of the range of difficulty for tasks iff 
the inventory as well as tasks introduced in early and later stages of training. 

Task difficulty was determined by a task sort into categories of "easy," 
"medium," and "difficult" and rank ordering of the 18 tasks by subject matter 
experts (HS-1 Instructor pilots). From this pool of 18 tasks, 9 tasks were 
selected with 3 from each category to assess the efficient use of information 
needed to reach a decision. These nine tasks and categories are presented 
in appendix C. 

INSTRUCTORS 

Flight task training was provided by the 28 regular HS-1 flight instruc- 
tors. All instructors had completed at least one tour in an operational 
assignment and the training course for flight instructors at HS-1. All 
instructors were briefed on the grading procedures currently in use as well 
as the proficiency grading system. 

MATERIALS AND EQUIPMENT 

Standard training materials and equipment were used by students and 
instructors at HS-1. No additional equipment or materials were required to 
obtain data and/or information necessary for this study. The primary data 
collection instruments were the standard syllabus grade card (appendix D) 
and the Naval Air Training and Operating Procedures Standardization Program 
(NATOPS) Flight Evaluation Worksheet (appendix E). 

To facilitate retrieval of task trial information and calculate CATES 
system decisions, data from the grade cards were entered on a WANG 2200 MVP 
computer at the TAE6. 

PROCEDURE 

As students proceeded through the training syllabus, performance was 
graded on the Syllabus Grade Card using both current procedures; i.e., NATOPS, 
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and the proficiency grading system procedure. The NATOPS procedure grades 
task performance in three categories or classifications: "Q M or Qualified 
(performance meets or surpasses NATOPS standards, "CQ" or Conditionally 
Qualified (performance not to established standards, but does not exhibit 
safety violations), H U" or Unqualified (performance not to standards and 
safety violations are exhibited). The NATOPS grade for, each task is a sum- 
mary of all task trials; i.e., there is only one NATOPS grade for each task 
for each flight or session. In addition to the NATOPS grading procedure, 
the grades for each practice trial on each task were recorded in the sequence 
the trial was attempted (Proficiency Grading System)., Syllabus grade cards 
were collected after each training flight or session. Data from each grade 
card were then entered into the WANG 2200 MVP.- 

Upon completion of the training syllabus and at the discretion of the 
instructor pilot/training manager, each student was scheduled for a final 
NATOPS flight evaluation. The instructor pilots/training manager were not 
apprised of any decisions made by the GATES system decision model. 

The NATOPS flight evaluation for each student was made by one of eight 
designated instructor pilots. Flight evaluation grades were recorded on the 
NATOPS Flight Evaluation Worksheet. It should be noted that the worksheet 
does not specify discrete tasks in the same manner as the syllabus grade 
cards. However, if the student performance is below standards set by NAI Ofo, 
the evaluator is required to specify the task and explain why the task was 
not performed to standards. Thus, performance of specific tasks on the flight 
evaluation could be obtained. Upon completion, the NATOPS evaluation flight 
worksheets were collected. These worksheets were reviewed and a determination 
was made concerning the evaluation grade for each^ task and each student; 
i.e., Qualified, Conditionally Qualified, or Unqualified. 

DEPENDENT MEASURES FOR THE CURRENT DECISION METHOD 

Two dependent measures were extracted from the data collected: (1) 
task performance information required to reach a decision and (2) the level 
of student proficiency upon completion of the training program. 

Task performance information required to reach a training decision was 
determined as the total number of practice trials the student attempted in 
the flight' training program. Each practice trial was envisaged as a bit 
of information the instructors acquired concerning student performance. 

The level of student proficiency was determined by the NATOPS grade 
awarded for each task on the last evaluation of training. Grades awarded on 
this basis would be more likely to use'.the same standards as required by the 
NATOPS flight evaluation. A grade of Qualified would indicate the instructor 
was confident the student was proficient and would perform the task to standards 
on the NATOPS evaluation. A grade of Conditionally Qualified would indicate 
the instructor was less confident the student would perform to standards and 
could benefit from additional training. 
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CATES SYSTEM PARAMETERS 

The CATES system decision model requires four values.be established: " 
(1) the lowest acceptable proportion of proficient trials at or below which 
the student is considered "Not Proficient" (Pi), (2) the proportion of profi- 
cient trials at or above which represents proficient performance fP?), (3) 
the probability of a TYPE I decision error (Alpha or (a)), (4) the probabil- 
ity of a TYPE II decision error (Beta or (/3)). 

The Pi parameter values were determined from an examination of first 
trial performance data from a group of 17 students undergoing training at 
HS-1. The proportion of acceptable trial performances to the total of first 
performances was usei to set the Pi value for each task. 

The ?2 parameter values were determined from the performance of 50 naval 
aviators on the NATOPS flight evaluation. The proportion of Qualified grades 
to the total grades avarded was calculated. The Pg values were then estab- 
lished at one-half :,Ur«dard deviation units below the mean proportion. 

In the present stu-y, r^ameter values for (a) and (/3) were arbitrarily 
select' ' .10. The par.imeters for the representative sample of 18 tasks 
used .lis study are shown in appendix 4J. 

DEPENDENT MEASURES FOR THE CATES SYSTEM DECISION MODEL 

As in the current decision method, two dependent measures were extracted 
from the data collected. These were: (1) task performance information required 
to reach a decision and (2) the level of student proficiency. 

Task performance information required was determined to be the total 
number of practice trials attempted before a CATES system decision was reached. 
It is important to note that because training and task practice terminated 
at the discretion of the instructor or training manager, there is a possi- 
bility the CATES system decision model would not have sufficient task trial 
information to reach a decision. If the task protocol had not resulted in 
crossing the upper boundary of the decision model (indicating student profi- 
ciency was "Undetermined"), an estimate was made of the number of additional 
trials required to make a decision. This estimate is based on a mathematical 
equation>developed by Hoel (1971) arid is shown in appendix F. Parameter 
values used in this equation were the same values set for each task. The 
estimated trials to a decision were then added to the total number of trials 
actually attempted. Using this procedure, it was possible for the CATES 
system decision model to require either less, equal, or more trial information 
to reach a decision than the current decision method for each task or student. 

A proficient level of performance by the student was determined if the 
CATES system decision model reached a "Proficient, Stop Training" decision 
based on actual trials. Thus, a "Proficient, Stop Training" decision was 
considered equivalent to the current decision method of awarding a "Qualified" 
grade for the task on the last training flight/session. 
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CRITERION FOR EVALUATION OF DECISIONS 

The criterion used to evaluate the accuracy of the training decisions 
made by the current decision method and the CATES system decision model was 
the student's graded task performance on the NATOPS flight evaluation. If a 
decision concerning proficient level of performance was .reached and subsequent 
task performance on the NATOPS flight evaluation was graded as Qualified, 
the decisions were considered correct. If the grade on the NATOPS flight 
evaluation was either Conditionally Qualified or Unqualified, the decision 
was considered incorrect. 

The information requirements and accuracy for the current decision method 
and the CATES system decision model were compared. Results of that comparison 
are described in the next section. 
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SECTION IV 
RESULTS 

MODEL INFORMATION REQUIREMENTS 

The first analysis dealt with the amount of information required by the 
two decision methods to reach a decision as a function of task difficulty. 
Results of the analysis of variance (ANOVA) are shown in table 1. 



TABLE 1. SOURCE TABLE FOR ANOVA OF INFORMATION REQUIREMENTS 

OF TWO DECISION METHODS AND THREE TASK .DIFFICULTY LEVELS 



Source 


Sum of Squares 


df 


MS 


F 


. A (Decision Method) 


1250.702 


1 


1250.702 


109.81* 


Error 


318.922 


28 


11.390 




B (Task Difficulty) 


73.215 


1 (Adj). 


36.607 


3.22 


Error 


637.075 


43 (Adj) 


11.376 




AB (Method x 










Task Difficulty) 


452.332 


2 


226.166 


43.22* 


Error 


293.058 


56 


5.233 





*P<.05 



Because the ANOVA was a repeated measures design, it was* suspected that" 
certain assumptions or requirements of the ANOVA may have been violated; 
i.e., lack of homogeneity, additivity. A conservative F-test with reduced 
degrees of freedom was conducted using the procedures recommended by Myer 
(1979). This conservative F-test still revealed significant differences for 
the A main effect (Decision Method) and the AB interaction effect (Method x 
Task Difficulty). However, for the B main effect (Task Difficulty), the 
test of significance failed to reach the critical level of .05. An epsilon 
factor (.7693) was determined from the variance-covariance matrix as 
recommended by Greenhouse-Geisser. With this adjustment to the degrees of 
freedom, the B main effect (Task Difficulty) did not reach the .05 level of 
significance. 

To determine significant differences within the interaction effect, the 
Tukey's Wholly Significant Difference (WSD) was computed. Any differences 
in the means greater than 2.224 may be considered significant at the .05 
level. Figure 3 graphically shows the relationship between decision method 
and task difficulty as a function of average trials required to reach a 
"stop training" decision. The figure shows that the CATES decision model 
required less information to make a "stop training" decision across all 
levels of task difficulty. The information requirements for the CATES 
decision model become greater as task difficulty increases. Reliable 
differences were found between information requirements for easy (x * 4.8 
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Fiqure 3. Trials Required to Reach a "Stop Training" Decision for Two 
Decision Methods Across Three Levels of Task Difficulty 
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trials) and difficult tasks = 8.2 trials) assessed by the CATES model, 
for the human judgment procedure, it appears the reverse is true. More 
information was collected on the easy tasks = 14.6 trials) than on the 
medium (x = 10.2 trials) or difficult tasks (x 21 10.5 trials). Differences 
in information requirements for medium and difficult tasks reached by human 
judgment were not reliable. The data indicate the CATES model requires less 
information to reach a decision than human judgment and the information 
requirements for CATES appears to trend in a logical manner; i.e., more 
difficult tasks require more trial information. 

ACCURACY OF DECISION METHODS 

To determine the degree to which the two decision methods were able to 
"predict" the student's performance on the final NATOPS flight evaluation 
and compare the judgr Tits made by each method across the three levels of 
task difficulty, the allowing analysis was done. The judpent made for 
each method was the proportion of "Qualified" or "Proficient, Stop Training" 
decisions to the overall possible decisions that could be made. There were 
87 possible decisions (3 tasks X 29 students) for each level of task diffi- 
culty. From these 87 possible decisions, the last instructor grade awarded 
was determined. If the final grade was a "Q" of Qualified, it was counted 
as a Qualified judgment made. If it was a "CQ" or Conditionally Qualified 
judgment, it was not considered to be a Qualified judgment. For the CATES 
decision model, only those student task protocols that crossed the upper 
boundary resulting in a "Proficient, Stop Training" judgment were considered 
as a "Qualified" judgment. Each of these judgments from both methods were 
dhen matched against the task-student evaluation made on the final NATOPS 
flight evaluation. A Qualified judpent made was considered correct if a 
Qualified grade for that task was awarded on the NATOPS flight evaluation. 

Table 2 shows the results of this examination of the proportion of Qual- 
ified judgments made and the proportion of correct ^dgments for the nine 
tasks. A te: v for proportions revealed no significant differences on the 
proportion of qualified judgments made between decision methods. There were 
no significant differences found in the proportions of correct judgments 
made between methods. 

TABLE 2. PROPORTION OF QUALIFIED JUDGMENTS AND PROPORTION OF 
CORRECT DECISIONS MADE BY EACH DECISION METHOD ACROSS 
THREE LEVELS OF TASK DIFFICULTY 









TASK DIFFICULTY 






METHOD 


Easy (N=87) 


Medium (N=87) 


Difficult (N=87) 




Qualified 


Correct 


Qualified 


Correct 


Qualified 


Correct 




Judgments 


Judgments 


Judgments 


Judgments 


Judgments 


Judgments 


CATES 


.9885 


.9884 


.6092 


.8302 


.6092 


.8112 


Instructor 


.9885 


.9884 


.7816 


.7794 


.7126 


.7097 
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Although no significant or reliable differences were found, it was noted 
that the proportion of Qualified judgments made decreased as task difficulty 
increased. This supports the intuitive judgment that the more difficult or 
' complex tasks are somewhat more difficult to evaluate with confidence. The 
CATES decision model appeared to be more conservative or less willing to 
make a judgment as task difficulty increased. However, once a decision had 
been made, the CATES decision method tended to be more correct than the 
instructor method. 

Considering this trend toward increased accuracy or correctness of judg- 
ments made, the entire sample of 18 tasks was assessed for Qualified judgments 
made and the accuracy of the judgments. Results indicated that for 12 of 
the 18 tasks, CATES was more correct in the judgments made. Proportions of 
correct decisions were equal for the instructor and CATES method on 2 of the 
18. Instructor judgments appeared to be more correct on 4 of the 18 tasks. 
A sign test revealed that CATES was reliably more correct in judgments than 
the instructors beyond the .05 level of significance. This finding would 
support a conclusion that if CATES decisions were used to determine proficiency 
across the training syllabus, a more accurate assessment would be made concern- 
ing student proficiency than the present method of instructor judgments. 
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SECTION V 

DISCUSSION, CONCLUSIONS, AND RECOMMENDATIONS 

Results of this evaluation indicate the CATES system decision model, 
using the parameter values established in the present study, requires less 
information to make a decision than the current system of human judgment. 
Decisions reached by the CATES system reflected a higher proportion of correct 
decisions in reference to the NATOPS flight evaluation. Across a representa- 
tive sample of 18 tasks, the CATES system was reliably more accurate than 
the current method of human judgment. The finding that the CATES system 
requires less trial information to reach a decision of greater precision 
strongly supports the CATES system decision model's superior efficiency when 
compared to the current method of making training judgments. Results of 
this study extend previous research results suggesting greater consistency 
and accuracy of mathematical models to an actual training situation in a 
considerably more unstructured environment. 

The proportion of judgments concerning student proficiency for easy 
tasks was high and equal for both methods. As task difficulty increased, 
however, the CATES system model made a lower proportion of decisions than 
the current method of instructor judgments. The conservatism or riskiness 
of the CATES system model is established through parameter values, specific- 
ally alpha (a) and beta (£). Since these parameter values were held constant 
across all tasks and levels of task difficulty, it is reasonable to conclude 
the instructors were willing to take more risks in decisions made on medium 
and difficult tasks. This willingness to take greater risks may result in 
the lowered proportions of correct decisions made by the instructors. Results 
of this study are similar to a study of human decision making behavior in a 
sequential testing situation reported by Becker (1958). According to Becker, 
subjects appeared to operate more like Wald's sequential sampling model when 
the problem was difficult than when the problem was easy. Typically, subjects 
required relatively more samples or information on easy problems and relatively 
less information on the difficult, as if they set alpha (a) and beta (fi) 
lower for the easy problems. 

The reasons why instructors obtained considerably more information than 
the CATES system required for easy tasks remains unclear. This may have 
resulted from: 

• easy tasks being introduced earlier in the training program allowing 
more time for practice 

• easy tasks being prerequisite to the performance of the more diffi- 
cult tasks; e.g., normal starting of the engines were required to 
accomplish more difficult flight tasks 

• instructors allowing students to perform easy tasks so that success- 
ful performance would motivate the student to perform better on 

the more difficult tasks 

• instructors being reinforced by the student's demonstrated high 
levels of performance on the easy tasks thus increasing the 
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Probability the instructor will request the student to perform the 
task again 

. instructors obtaining information about student performance of 
easy tasks is done at a lower "cost." Easier tasks are probably 
less complex to evaluate and do- not require as high a degree of 
actual physical risk to the student and instructor than more diffi- 
cult complex tasks present. 

Whether there is single or multiple causes, it would appear that easy 
tasks are "overlearned" as a result of significantly more practice allowed. 
This overlearning probably results in considerable performance consistency 
by the student resulting in high agreement between the current decision 
methods, the CATES system decision model, and the NATOPS flight evaluation. 
Considering the greater consistency of performance and the high agreement 
between decision and evaluation method, it appears that overlearning is 
highly desirable. 

The more salient issue from the training managers point of view concerns 
the cost of "overtraining." The results indicated the amount of turning 
provided for easy tasks in excess of that required to make a CATE5 system 
decision; however, it was not within the scope of this study to determine 
the economic or training costs incurred by training beyond acceptable prof - 
ciency levels. If such an evaluation were conducted in the future, it would 
be necessary to consider several possible causes of "overtraining rather 
than simply the amount and cost of providing training beyond required levels. 

Of considerable interest to the training manager is the issue of "under- 
training" the medium and difficult tasks. Neither the CATES system nor the 
current human judgment method were able to render qualified or P^cient 
judqments in 20 to 40 percent of the proficiency decisions. A paradox seems 
to exist in the data. While the CATES system decision model appeared to be 
more conservative in making a judpent than the current decis on method, the 
amount of trial information needed to reach a decision was reliably less for 
the CATES decision model. It would appear logical that a relatively conser- 
vative method would require more data or task performance information. Train- 
ing trial sequences were individually examined to determine reasons for this 
apparent paradox. The observation was made that students demonstrating con- 
sistent proficient performance continued to perform training trials well 
after the CATES decision (overlearning). Conversely, students with more 
variable task protocols were not afforded the opportunity to practice the 
task with a sufficient number of trials needed to- reach a CATES decision. 
It would appear that the paradox of the more conservative model- requiring 
less information to make a decision could be attributed to under and over- 
training in the medium and difficult tasks. 

An important methodological restriction was placed on this evaluation. 
Students proceeded through the training program at the discretion of the 
instructor/training manager under the current decisional method, m tne 
event the CATES system reached a decision, training may have continued. 
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Although in a strict sense, the CATES system would consider the additional 
task training and trial information unnecessary to reach a decision, in a 
practical sense this additional training may have been an important factor 
in the final NATOPS flight evaluation. Certainly no implication should be 
made that training trials beyond a CATES system decision of proficiency is 
unnecessary overtraining. The next logical step in eliminating this method- 
ological flaw would be to further evaluate the CATES system with methodology 
similar to that used in the present study, with the exception that the pro- 
cedure should provide for additional training beyond the current decision 
method if additional information is required to reach a CATES system decision. 
Thus, the proportion of proficient judgments made by the CATES system would 
increase. If results were found similar to this study, strong evidence would 
be available for employing the CATES system in an important role for training 
decisions. 

It should be noted that the criterion measure for both the current deci- 
sion method and the CATES system decision model was performance on the NATOPS ' 
flight evaluation. Although the NATOPS flight evaluations are conducted 
using specially selected, experienced naval aviators, no measures of validity 
or reliability have been determined for that procedure. Essentially, perfor- 
mance on the NATOPS flight evaluation is determined in the same manner as 
used by the current decision method on training flights. The fact that NATOPS 
evaluators are specially selected, experienced, and trained may very well 
result in a greater reliability for the NATOPS evaluation of flight perfor- 
mance. Nevertheless, it is still subject to the problems of human variabil- 
ity; i.e., biases, varying standards, personal interaction. Determination 
of the validity and reliability of criterion measures is a difficult and 
elusive task. However, if naval aviation continues to use the NATOPS flight 
evaluation as a yardstick to measure flight performance, it is desirable 
that this task be undertaken. 
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CONCLUSIONS AND RECOHHENDATIONS 

Based on the results of this study, the following conclusions anc 
recommendations are made. 



CONCLUSION 

The CATES system is particularly 
useful to manage the training 
syllabus at the lowest element of 
the syllabus (f lightx^ask). 
Changes to the syllable resulting 
from addition/deletion/modifica- 
tion of the flight tasks can be 
made quickly and efficiently. 

The determination of student per- 
formance at the task level rather 
than event/session/flight level , 
provides a more well defined pic- 
ture of student performance. It 
allows the instructor/training 
manager to determine student 
Strengths and weaknesses in a 
more timely manner. 

The Proficiency Grading System 
provides student task performance 
information with- better defini- 
tion and specificity than the 
currently used NATOPS grading 
procedures. 



RECOMMENDATION 

HS-1 should consider extending 
the current ATSS of managing the 
syllabus at the event level to 
tasks trained within each event. 



HS-1 should focus on student per- 
formance of individual tasks in 
the training syllabus. Capabili- 
ties of the ATSS to record student 
performance on events/sessions/ 
flights should be extended to 
record student task performance 
within an event/session/flight. 



HS-1 should continue to use t!ie 
proficiency grading procedures 
for Category I replacement pilots. 
The proficiency grading procedure 
should be extended to include all 
categories of replacement pilots. 



The CATES system decision model 
appears to be more efficient and 
accurate than the current method 
of determining student task pro- 
ficiency and making training 
judgments. 

Method used for establishing 
•CATES system model parameters; 
i.e., Pi, P2» (a),, and (/3), 
appears, to be reasonable and in 
general agreement with the 
present system of making training 
decisions. 



Based on positive results of future 
evaluations, HS-1 should consider 
incorporating the CATES system 
decision model to augment the 
current method for making training 
decisions. 

Continue using this method to 
establish parameters for all tasks 
to be trained in the replacement , 
pilot training syllabus. 



Data from this study indicate 
considerable variability in 
instructor judgments. Levels of 
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standardize instructors to reduce 
variability in grading student 
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risk appear to vary with task « 
difficulty and instructors when 
using the NATOPS grading procedures. 
This variability and instructor 
bias may affect the reliability 
of the NATOPS grading procedure 
to a considerable degree. 

The CATES system decision model 
is useful to preclude the "under- 
training" of tasks. 



performance, thus increasing the 
reliability of the grading. 



The CATES system decision model 
may be useful to determine exces-. 
sive task training (overtraining), 



The CATES system could be adapted 
to other FRS flight training 
programs. 



The CATES system may provide a 
more efficient and accurate method 
of determining student performance 
in Undergraduate Pilot Training. 



The validity or reliability of 
the NATOPS flight evaluation has 
not been determined. The" evalua- 
tion is. subject to the same 
vagaries and variability noted in 
evaluating student performance in 
the training program. 



Student task performance should 
be evaluated using the CATES deci- 
sion model. If proficiency level 
cannot be determined by the CATES 
system within parameters used by 
the system, training should be 
continued until a decision is 
reached. 

Tasks that are trained beyond 
levels required by the CATES 
system decision model should be 
carefully monitored to ensure the 
additional training is desirable 
fpr improving student performance 
across the overall flight syllabus, 

If subsequent evaluations reveal 
the CATES system continues to 
result in greater efficiency and 
higher accuracy in reaching train- 
ing decisions, other FRSs may 
consider incorporating the 
CATES system into their training 
programs. 

If subsequent evaluations reveal 
the CATES system continues to 
result in greater efficiency and 
higher accuracy in reaching train- 
ing decisions, the Chief of 
Naval Air Training should con- 
sider evaluating the CATES, system 
for possible inclusion in Under- 
graduate Pilot Training. 

Naval Air Systems Command should 
consider initiating a program to 
determine the validity and reli- 
ability of the NATOPS flight 
evaluation program. 
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POST NOTE 



This study provides evidence that a mathematical decision model, specif- 
ically the CATES system decision model, can powerfully augment present train- 
ing decision methods for replacement pilots undergoing training at the FR5. 
It is worthy to note that in addition to achieving more accurate and precise 
training decisions, the CATES system also provides a useful tool for the 
management of a curriculum. The CATES system provides documentation as well 
as student performance measures at the lowest level or element wWhe curric- 
ulum; i.e., the flight task. This documentation and recordkeeping, combined 
with the apparent effective tool for making training decisions, makes the 
CATES system especially amenable as a computer-based or computer-managed 
instructional system. 

As a result of this .conceptual logic and the findings in this study, 
HS-1 is aggressively pursuing the incorporation of the CATES system into the 
ATSS to aid in increasing the efficiency of training management. Upon com- 
pletion of this effort, ft is envisaged that the procedures, used m incorpo- 
rating the CATES system into the ATSS accompanied by a user's manual will be 
published in a future TAEG report. 

In addition, further evaluation of the CATES system decision model is 
being planned at HS-1 to provide additional training required to reach a 
CATES system decision. Such an evaluation will extend the findings in this 
study by providing actual rather than estimated information required to reach 
a decision. This planned evaluation will also determine if the additional 
training will impact on the NATOPS flight evaluation in terms of accuracy 
and precision of decisions similar to the findings in this study. Results 
of this study will al:;o be published as a TAEG report. 
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APPENDIX A 
WALD BINOMIAL PROBABILITY RATIO TEST 
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WALD BINOMIAL PROBABILITY RATIO TEST 

The Wald binomial probability ratio test was developed by Wald (1947) as 
a means of making statistical decisions using as limited a sample as possible. 
The procedure involves the consideration of two hypotheses: 

H o : P - P l 
and : P £ P 2 where 

P is the proportion of nondefectives in the collection under consideration, 
Pi is the minimum proportion of nondefectives at or below which the collec- 
tion is rejected, and P2 is the desired proportion of nondefectives, at or 
above which the collection is accepted. A Since a simple hypothesis is being 
tested against a simple alternative, the^basis for deciding between 'H Q and 
H-j may be tested using the likelihood ratio: 

P 2n _ (P 2 ) dn (1 - P 2 ) n - dn 

p ln ~ (p/" a- v n " dn 

Where: P. = Minimum proportion of nondefectives at or below which the 
collection is rejected. \ 

?2 - Desirable proportion of nondefectives at or above which the 
collection is accepted. 

n = Total items in collection. 

dn = Total nondefectives in collection. 

The sequential testing procedure provides, for a postponement region 
based on prescribed values of alpha (a) and bfeta (fi) that approximate the 
two types of errors found in the statistical decision process. To test the 
hypothesis H Q : P = P^, calculate the likelihood ratio and proceed as follows: 

!• ^ ^2n < JL> accept H n 

P ln " 1-* 

p 

2 if > 1- §£ ' accept H-j 

Pi_ CL 
P 

3. if 8 . \ 2n . . , take an additional observation. 

■ ' - in ' 

These three decisions relate well to the task proficiency problem. We 
may use .the following rules: 

1. Accept the hypothesis that the grade of P is accumulated in lower 
proportions than acceptable performance would indicate. 
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2. Reject the hypothesis that the grade of Ms accumulated in lower 
proportions than acceptable performance would indicate. By rejecting this 
hypothesis an alternative hypothesis is accepted that the grade of P is 
accumulated in proportions equal to or greater than desired Romance! 

cannot ^a^»^effl!«. ad-,tl0 " 1 > 

the Mrifls^dB.?!: sSi!° calculate the decision regi ? ns ° f 

109_A loa 



log '' r l 
"TTT" 

* n i + n J 2 



P l 1_? 2 p i T^PJ 



log Jr£ log '"Pl 

dn ;> a + n PR 



+ n __2 

log h + log 1^1 . i 0 g p 2 + log 1 - p i 



Where 



dn = Accumulation of trials graded as "P« in the sequence 
n = Total trials presented in the sequence 

p 

1 = Lowest acceptable proportion of proficient trials (P) reauired 
to pass the NATOPS flight evaluation with a grade of >alified.' 

^ - desirable 

AW*) - jygjyiij g ^^^^ . „ is 

Beta(/S) = The probability of making a type II efror (deciding a student 
" ot proficient when infact he is proficient). 

two lWeH.Ss° f flS JSTSSliZ IS" d ^ e ™ 1ne the inte "*l> te of the 
alpha (S) and^'Srl ffl ffir^^^ES. 0 '- 
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The second term of the equations determines the slopes of the linear 
equation. Since the second term is the same for both equations, the result 
will be slopes with parallel lines. Values of Pi and P2 as well as differences 
between P n and ? 9 affect the slope of the lines. This is easily translated 
into task difficulty. As P2 values increase, indicating easier tasks, the 
slope becomes more steep. This in turn results in fewer trials required in 
the sample to reach a decision. 

As differences in Pi and P2 increase, the slope also becomes steeper and 
the uncertainty reaion decreases. This is consonant with rational decision 
making. When the difference between the lower level of proficiency and upper 
level of proficiency is great, it is easier to determine at which proficiency 
level the pilot trainee is performing. The concept of differences in P] 
and P2 is analogous to the poncept of effect size in statistically testing 
the difference between the means of two groups. In such statistical testing, 
when alpha (a) and beta OS) remain constant, the number of observations 
required to detect a significant difference may be reduced as the anticipated 
effect size increases (Kalisch, 1980). 
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APPENDIX B 

TASKS AND PARAMETER VALUES USED IN EVALUATION 
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TASKS AND TASK PARAMETERS USED IN EVALUATION 









Parameters , 






Task Description 






Pi 


P? 


1. 


Normal Landing 


.10 


.10 


.18 


• /O 


2. 


Normal Approach 


.10 ■ 


.10 


.18 


.78 


3. 


Free Stream Recovery 


.10 


.10 


.12 


.55 


4. 


Single Engine Approach 


.10 


.10 


.18 


.75 


5. 


Single Engine Landing 


.10 


;io 


.53 


• /O 


6. 


Single Engine Malfunction 
Analysis 


.10 


.10 


.06 


.51 


7. 


ASE Off Landing 


.10 


.10 


.30 


.86 


8. 


Alternate Approach Pilot 
Procedures 


.10 


.10 


.18 


.69 


9. 


Windline SAR Pilot Procedures 


.10 


.10 


.38 


.80 


10. 


Normal Start 


.10 


.10 


.12 


.65 


11. 


Rotor Engagement 


.10 


.10 


.47 


.75 


12. 


Single Engine Malfunction 
Takeoff Abort 


.10 


.10 


.41 


.75 


13. 


Automatic Approach Pilot 
Procedures 


.10 


.10 


.24 


.90 


Id 


Sprvn Malfunction 


.10 


.10 


.25 


.62 


15. 


Manual Throttle 


.10 


.10 


.35 


.51 


16. 


ASE Malfunction 


.10 


.10 


.35 


.62 


17. 


SAR Manual Approach 


.10 


.10 


.25 


.80 


18. 


Shutdown Checklist 


.10 


.10 


.29 


.91 
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APPENDIX C 

TASKS AND LEVEL OF DIFFICULTY USED TO 
' EVALUATE EFFICIENCY 
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TASKS AND LEVEL OF DIFFICULTY USED TO EVALUATE EFFICIENCY 



Level of Difficulty 


Tasks 


Easy 


Normal Start 
Shutdown Checklist 

iiurilla 1 Lanuiny 


Medium 


SAR Manual Approach 

Alternate Approach Pilot Procedures 

Single Engine Malfunction Takeoff Abort 


Difficult 


Windline Search and Rescue Pilot Procedure 
ASE Off Landing 
Freestream Recovery 
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APPENDIX D 
SAMPLE GRADE CARD FOR DATA RECORDING 
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HS 1 «(T«G> TWIH1NC FOW REV. 2 (1L DEC 80) ae _ _ \ \ \ 










A \ 


IHST ' 


w — w 

JKCOHP v^v 






it \ 


DATE 


PILOT COPILOT \\ 
TIKE TIME \ 






>K \ 










tn»nnT name 

TA« tt»F 












DA100 


TAP NAV ruzrv 












DA200 


COUPLER DOPPLER CHECK 












B6500 


KITE L16HTING PROCEDURE 












bt^UU 


lycTpiiMCWT TAkTOPF 












bqiaa 

BDiUU 


IVCTDJIMttfT nCDAOTJIPC 














PRF-DIP CHECKLIST 












DB100 


Airrn add do a ru dii tvr PDAfPTMIPP^ 












DC100 


ALTERNATE APPROACH PILOT PROCEDURES (INTRO) 














AITFRNATF APPROACH COPI10T PROCEDURES 












EB300 


HOVER DEPARTURE PRSXxBUKtS 












DA500 


SONAR DEPLOYMENT VOICE PROCEDURES 












DrlOO 


tier fiC PAW C AtTlTHfiF /IUTDO\ 












DE100 


FRffSTREAM RECOVERY — 












EB100 


I PR ^AR ^fFNARIO DEMO 












DtSUZ 


TAffia APPPftAm 














HISSED APPROACH 












DC4Cl> 


but .HrrKUAUl 












CE300 


MANUAL THROTTLE 














rue r vi ictc 
LHhL&Lldld 












CE500 


SINGLE ENGINE nALrUNtl lun AnALTMo 










































MALFUNCTIONS/FfirRGFNCIES (GRADE JLGIVEN) _,. 












FA756 


ELECTRICAL FIRE 












DE912 


BEEPER TRIM FAILURE 












FDM5/8<l6 


FUEL CONTROL CONTAMINATION 1 












FB878 


ASE MALFUNCTION (.879 TO .890) 












DE938 


RADAR ALTIMETER FAILURE 












FD835/836 


COMPRESSOR STALL 












FD803/8M 


LUBE PUMP SHAFT FAILURE 












FDW3/8M 


P-3 SIGNAL LOSS 












FA751 


GENERATOR FAIL (.751/752) 


























DE200 


SONAR RAISE MALFUNCTIONS 












DEW 


BOTTOMED DOT 












DE500 


HUNG DOME 
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HS 1 (ItfO) TBAIHIHS row R£». 2 (11 DCC 80) 



TASK CODE 




















* 








































*' 


















* 


COCKPIT PROCEDURE 








PREPARATION 










hSADWORK J 

— ■■ A 








°l sctJS$ AUTO AND ALTERNATE APPROACHES / 








HOVER DEPARTURE PROCEDURES. MANUAL CLIHBOUT ' 










SWIMMER DEPLOYMENT * . 








PROCEDURES (40 FOOT HOVER, 15 FOOT HOVER AND 10 FOOT 








10 KNOT APPROACH) 


















SYSTEMS KNOW. EDGE: 








COUPLER, LIGHTING 




TASK COOE TASK COUNTS 








■ i 




















































•RAINING OFFICER RF/IEW 


INSTRUCTOR SIGNATURE 






o 
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APPENDIX E 
NATOPS WORKSHEET 
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H-3 PILOT NATOPS EVALUATION WORKSHEET 
{Rev. 9-79) 



PILOT 



FLIGHT DATE 



FLIGHT DURATION 
SIDE NUMBER . 



OPEN BOOK EXAM DATE 



CLOSED BOOK EXAM DATE 
ORAL EXAM DATE 



GRADE 



3UN0 



GRADE 
3RADE 
GRADE 



OVERALL FINAL GRADE 



EVALUATOR 



NOTES : 



1. A grade of unqualified xr. any critical area/sub area 
will result in an overall grade cf unqualified for the 
flight. 

2. A grade of conditionally qualified in a critical area 
will result in an overall crace of conditionally 
quAlifitf'for the flight. 

3. Only the nunbers 0, 2, or 4 vill se assigned to sub 
areas. No interpolation is allowed. 

Unqualified 0.3 

Conditionally Qualified 2.0 

Qualified 4.0 
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GRADE PILOTS ORAL EM ERGENCY WORKSHFI-T 

1. Electrical Malfunctions. 

I a. Generators- 

b. Electrical fire 

2. ASE Malfunctions 

a. Pitch 

b. Roll 

c. Collective 

d. Yaw 

3. Transmission Malfunctions 

*a. Chip detected 

*b. Pressure loss 

c. Tail takeoff 

d. Torque system 

4. Engine Malfunctions 

, *a. Engine fire 

. *b. Flex shaft 

c. Oil pressure 

d. Oil temperature 

e. Hot start 

f. Post shutdown fire 

g. PMS 

5. Rotary Rudder Malfunctions 

*a. Tail Rotor control/drive loss 

*b. TGB,'IGB chip light 

6. Fuel System Malfunctions 

a. Fuel filter bypasN 

9 b. Fuel boost pump 

7. Hydraulic Malfunctions 
a Primary 

b Auxiliary" 

c. Utility 

d. Sensing unit 

Water Operation* 

*a Water landing 

b Water takeoff 

t c Fuel dumping 

Q. Rotor Brake Malfunctions 

a. Inflight 

b. Shutdown 
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PILOTS ORAL EMERGENCY WORKSHEET GRADE 

50. Discussion Items 

fc a. Power settling 

*b. Blade stall 

*c. Dynamic rollover 

d. Sonar hoist — : 

e. MAD reeling machine 

t\ AKT 22 antenna 



GENERAL COMMENTS OVERALL GRADE . 



KJ A^ 
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PILOT EVALUATIONS WORKSHEET 

Area I. Ground Operations 

a. Brief/debrief/flight gear 

b. Records check * 

c. Preflight/postflight * 

d. Checklist procedures/systems check 

e. Start/engagement _ 

f. Taxi/lookout 

g. Disengagement/shutdown 

h. General _ 



Area I. Ground Operations 

a CONDITIONALLY QUALIFIED. Did not fully instruct or debrief the crew. 

Flight equipment improperly worn or m marginal condition. Did not fully 
examine flight records. Minor omissions or errors on preflight or postflight, 
Improper or incomplete use of checklists, Non standard procedures. 
Inattention or misinterpretation of visual signal. Rough or erratic start, 
engagement, disengagement or shutdown. • 

i 

UNQUALIFIED. Did not conduct bnei or debrief Flight equipment miss- 
mg» not worn or in an unsafe condition. Failed to sign for watti or 
accepted aircraft with grounding discrepancy, Failed to note or record 
downing discrepanc) after flight. An> omission or error on preflight or 
postflight which would affect sjfe:> of fliglti Exceeded published limita- 
tions during start, engagement, disengagement or shutdown Did not 
utilize checklist or perform required systems diecks, Marginal control ot 
. helicopter while taxiing. Ignored wual signal. Did not use pre-takeoff 
checklist. 



ERIC 
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Area II. Normal Flight Operations 

a. Checklist procedures 

b. Transitiop/climb 

c. Cruise flight 

d. Systems knowledge/usage 
e: Normal lindings/takeoffs 

f. Hover/low work 

g. General 



Area II. Normal Flight Operations 

CONDITIONALLY QUALIFIED. Im^^otx^tS^^a. 
or landing checklist. Application of power erratic : but did no^xceed 
limitations Unable to maintain altitude within ±50 feet of assigned 
Scutari airspeed within ±10 knots. 

degrees between final approach and landing. Hover altitude 15 feet -5 feet. 
Unable to fully explain aircraft systems or limitations. 

3 Z .ouchdown'takeoff. Unsatisfactory knowledge o« aircran 
systems o; limitations. 
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Area HI hnerge*uv Opca\ *n> 
a. Autorot liio* 

b Si ngie e i . gi :* e j.nd in.:* a a * Oi - f ' 

v AUX oft la! j.: ^ 

J. ASP otfian^i'v^ 

c hmereenc\ ,v ^.e^,;o 

f Genera 



Area 1 1 £ Emergency Operate rs 

CONDITIONALLY QUALlUhD Did not pre-bnei o>-pi!ot on autorota- 
lions. Airspeed, Nr and heading control erratk Groundspeed exceeded ! 5 
knots or slight dnit at recover) D^a not OsVtNish ind maintain nuniimim 
safe single engine speed on landings or wa\eoits Minor difnuiity in 
controlling Nr during single engine Power ^ heading and altitude control 
erratic during AUX or ASF off flight Did not fuiK comply with emergency 
procedures bud did not jeopardize aircraft or aew 

UNQUAI IFIFD During automation did not vail for full power. Airspeed, 
Nr and heading control beyond safe limits Implemented techniques that 
would have jeopardized T he successful completion and recovery of the auto- 
rotation FVed to call for lull power 'PMS off during single engines. Failed 
to note or correct lou unsafe Nr conditions during single engine. H\eeeded 
rate of descent limits during single engine approach or engine limits ASF 
off and ATX off fight unsafe or excessive later il drift rate of decent on 
touchdown I ailed to . HupU with established emerceiu\ procedures whk h 
resulted in jeopardizing nr. raft 'crew or e\cc\ded en cine airtrame limitations 
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Arc; IV C apier Sonar Operas 1*1 Hooued> 

• \utomaih. -pp r < ^vh 

v Alternate appri*ai»i 

d Climb out 

w- r.iupler sonar emer^n^ci 

; Gene ^31 



o 



\,ea iV f Kipit'f Sonar Operations iHooueai 

CONDITIONALLY 01 AUFfED Minor deviates from established chec 
ist anJ ^ic3 procedure* £rrati^ control of Mcnii during automatic, 
•itcrnaie approach ind anno out Erratic altitude control oi 1 50 feet 
•ect Rtjcca >Iowi\ to emerzenu^s 'Jnahle :o iui!y ,-xpiam systems or 
.mujtions 

' "NQL'ALIFIED Checklist not used or umtt'improper procedures 
,til.wd Allowed aircraft "> descend through 30 feet in hover without 
.periling -o correct. Mace omissions or errors m emergency procedures 
•hot .Ciuid' eoparuue jircnr or crew \t>empted to hover iownwmd 
vrh'ui crrre.nn^ L'nsat.suctorv -.nowiedgc of ^sterns or procedures 
: r rjn,« »o .consistency tamtam 1 SO ±30 teet while hooded 
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Area V Search and Rev.ue Operations 
a. Navigation 

o IFR procedures (Hooded ) 
v VFR procedures 
d. Crew/cockpit coordination 
e General 



vcj V Search and Rescue Operations (Hooded) 9 

CONDITIONALLY Ql'ALIMfcD No u.or.iir:aMm of *i..ui lookout 
doctrine I Ned iionsMnJard vokc approach pir-cn or hoist prooe 
dure^ bui none whui would scnoiisiv at^t me Mission Did no? m M » 
- r^pe'lv ^frfi/e copilot uevv and systems m a.,, rnplishinc/esaie 

I NQl'ALIFItD. Could not tollow wind line re^ue pattern Hovered 
Amnwind without correcting Unable to^r^i.Ttly maintain Uu t V! 
'eel !».^)ded A!U>wcd aircraft to defend below feet during app<vk* 
hovo* wither ,orrt\t:np Kwceded i.ruuf; 'irutar:**!^ or procedures -nji 
•Atniid r, ivc jeopardized amnh .,r<^ w 
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APPENDIX F 

MATHEMATICAL EQUATION FOR ESTIMATING TRIALS TO REACH 
STOP TRAINING DECISION FOR THE CATES DECISION MODEL 
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MATHEMATICAL EQUATION USED TO ESTIMATE TRIALS TO A 
"PROFICIENT, STOP TRAINING" DECISION 



j8 log + 1 -/3 log JdL 

\~ a a 

Additional Estimated Trials to P2 = 



1-P 2 P 2 
1 - P2 log r-r— + P2 log — 



Overall Estimated Trials = Trials Performed by Student + Additional 
Estimated Trials to ?2 (estimated trials required to cross 
the upper boundary) 
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